cloud_storage: include revision_id in spillover manifest cache key#29068
Conversation
CI test resultstest results on build#78196test results on build#78253
|
| // Test a scenario where after a topic recreation the spillover manifests from | ||
| // the previous incarnation could be applied to the new topic, causing errors or | ||
| // reading wrong data. |
There was a problem hiding this comment.
for posterity, this test would consistently fail before your fix?
There was a problem hiding this comment.
Yes. Every single time. The test was written before the fix.
I need to an assert though to make sure that spillover did actually happen to future proof it.
There was a problem hiding this comment.
Every single time
😞
assert ... to make sure that spillover did actually happen
good idea 👍
Fix cache collision where spillover manifests from a deleted topic could be applied to a newly created topic with the same ntp. The cache key now includes revision_id to distinguish between different topic incarnations. Adds test that reproduces the issue by creating a topic, applying spillover, deleting the topic, and recreating it with the same name.
9c7f258 to
379d427
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes a cache collision bug in the spillover manifest cache where manifests from a deleted topic could be incorrectly applied to a newly created topic with the same ntp. The fix adds revision_id to the cache key to distinguish between different topic incarnations.
Key Changes:
- Updated manifest cache key from
tuple<ntp, offset>totuple<ntp, revision_id, offset> - Added test reproducing the collision scenario (create topic, apply spillover, delete, recreate)
- Modified test utilities to support sequential record generation across topic recreations
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/v/cloud_storage/materialized_manifest_cache.h | Updated cache key type definition to include revision_id |
| src/v/cloud_storage/materialized_manifest_cache.cc | Updated cache key construction and formatting to include revision_id |
| src/v/cloud_storage/async_manifest_materializer.cc | Updated all cache lookups to include revision_id in the key |
| src/v/cloud_storage/tests/materialized_manifest_cache_test.cc | Updated test helper functions to construct keys with revision_id |
| src/v/cloud_storage/tests/cloud_storage_e2e_test.cc | Added regression test for spillover cache collision bug |
| src/v/cloud_storage/tests/produce_utils.h | Enhanced generator to support continuing sequence across invocations |
| remote_segment_generator& start_ix(size_t ix) { | ||
| _current_ix = ix; | ||
| return *this; | ||
| } |
There was a problem hiding this comment.
The new start_ix method lacks documentation explaining its purpose and when it should be used. Add a comment describing that this method sets the starting index for record generation, useful for continuing sequences across multiple invocations or topic incarnations.
| vlog(e2e_test_log.info, "Running iteration {}", iteration); | ||
|
|
||
| add_topic({model::kafka_namespace, topic_name}, 1, props).get(); | ||
| wait_for_leader(ntp).get(); |
There was a problem hiding this comment.
The comment 'Seeding partition data' is correct, but consider making it more descriptive for this specific iteration context, such as 'Seeding partition data for iteration N' to improve test output clarity.
| wait_for_leader(ntp).get(); | |
| SCOPED_TRACE( | |
| fmt::format("Seeding partition data for iteration {}", iteration)); |
| // Consume all data. | ||
| tests::kafka_consume_transport consumer(make_kafka_client().get()); | ||
| auto deferred_c_close = ss::defer( | ||
| [&consumer] { consumer.stop().get(); }); |
There was a problem hiding this comment.
The test verifies only the first record's key matches the expected iteration value. Consider also checking the last record or a sample of records to ensure all consumed data is from the correct iteration and not mixed with data from the previous topic incarnation.
|
/backport v25.3.x |
|
/backport v25.2.x |
|
/backport v25.1.x |
|
Failed to create a backport PR to v25.2.x branch. I tried: |
|
Failed to create a backport PR to v25.1.x branch. I tried: |
Fix cache collision where spillover manifests from a deleted topic could
be applied to a newly created topic with the same ntp. The cache key now
includes revision_id to distinguish between different topic incarnations.
Adds test that reproduces the issue by creating a topic, applying
spillover, deleting the topic, and recreating it with the same name.
Backports Required
Release Notes
Bug Fixes